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Abstract 

We present an optimal control approach to the problem of model calibration for 
Levy processes based on a non parametric estimation procedure. The calibration 
problem is of considerable interest in mathematical finance and beyond. Calibration 
of Levy processes is particularly challenging as the jump distribution is given by an 
arbitrary Levy measure, which form a infinite dimensional space. In this work, we 
follow an approach which is related to the maximum likelihood theory of sieves |21j . 

The sampling of the Levy process is modelled as independent observations of the 
stochastic process at some terminal time T. We use a generic spline discretization 
of the Levy jump measure and select an adequate size of the spline basis using the 
Akaike Information Criterion (AIC) |13j . 

The numerical solution of the Levy calibration problem requires efficient op¬ 
timization of the log likelihood functional in high dimensional parameter spaces. 

We provide this by the optimal control of Kolmogorov’s forward equation for the 
probability density function (Fokker-Planck equation). The first order optimality 
conditions are derived based on the Lagrange multiplier technique in a functional 
space. The resulting Partial Integral-Differential Equations (PIDE) are discretized, 
numerically solved and controlled using scheme a composed of Chang-Cooper, BDF2 
and direct quadrature methods. For the numerical solver of the Kolmogorov’s for¬ 
ward equation we prove conditions for non-negativity and stability in the norm 
of the discrete solution. To set boundary conditions, we argue that any Levy process 
on the real line can be projected to a torus, where it again is a Levy process. If the 
torus is sufficiently large, the loss of information is negligible. 
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1 Introduction 

Levy processes play a large role in contemporary mathematical hnance [15], but also 
in many areas of physics, see e.g. El ESj. A real valued Levy process is a stochastic 
process Y{t) that has increments Y{t) — Y{s), t > s, that are independent of the past. 
The increments are also stationary in the sense that the probability distribution of the 
increment only depends on the time difference t — s. Furthermore, K(0) = 0 and a 
stochastic continuity condition for t = 0 holds, see e.g. [2|. Under the given conditions, 
the characteristic function of Y(t) is given by the Levy Khinchine representation 

E (1) 

E[-] stands for the expected value, '^(fc) is a conditionally positive dehnite function [8] 
that has the following representation in terms of the canonical triplet {b, cx^, z/): 

2 r 

'ip{k) = ibk —— 1 — *s/cl{|s|<i}(s)) (iz/(s). (2) 

2 Jr\{o} 

6, cr^ G M are constants, > 0, and the Levy measure u is a positive measure on M \ {0} 
such that 

/ min(l, s^)(iz/(s) < oo. (3) 

Jr\{0} 

In Eq. g 1{|,| <i}(s) is the characteristic function of the set {|s| < 1} which takes the 
value 1 on this set and 0 otherwise. 

The calibration problem for Levy processes consists of the estimation of the canon¬ 
ical triplet (6, cT^,z/) given the observation Y(tj) of the process’ trajectory Y(t) at some 
prescribed times tj, j = 1,L. For instance, Y(t) could be the process of log-Returns 
of some asset and tj could be the closing time of the j-th trading day (’historic low fre¬ 
quency data’). As the j-th increment of the process Yj = Y(tj) — Y(tj-i) has the same 
distribution as Y{T), if T = tj — tj-i, from a statistical point of view this is equivalent 
to the L-fold independent observation of the terminal values Y{T) at time T. 

Y(t) can also be understood as the solution to the Stochastic Differential Equation 
(SDE) of jump-diffusion type 

dY{t) = bdt + adB{t) + / yN{dt,dy)+ I yN{dt,dy), K(0) = 0. (4) 

J{\y\<l} M{| 2 ;|> 1 } 
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Here B{t) is a standard Brownian motion and N{{t,t + At], A) ~ Po(z/(H)At) is the 
random connting measnre of jumps of height in the set H C M in the time interval 
{t, t+At]. Po(A) stands for the Poisson distribution with intensity A and N{{t, t+At], A) = 
N{{t, t+At], A) — i>{A)At is the compensated or Martingale jump measure for small jumps, 
where we require H C M \ {|x| < £} for some £ > 0, see [2] for further details. 

The calibration problem for Levy processes, respectively the solution of Q, unfortu¬ 
nately is ill posed: The collection of all Levy measures u is inhnite dimensional, while 
only L observations are available. Direct application of the maximum likelihood principle 
in this situation leads to severe over-htting issues [2T]. In many applications, one chooses 
families of Levy measures I'ia) that depend only on a hnite dimensional parameter vec¬ 
tor a, see e.g. m- Furthermore, one often restricts to such parametrizations, where 
the density f{x,T,a) of the probability distribution of Y{t) can be calculated explicitly 
or at low numerical cost. One then assumes that the true distribution of Y{t) is inside 
the prescribed set and uses the maximum likelihood approach for calibration [12]. This 
assumption might however not be justihed and give rise to modelling errors. 

As a non-parametric alternative, one can use generic parametrizations for the density 
of the Levy measure u that can be rehned depending on the amount of data available. 
This gives rise to a hierarchy - or sieve [21] - of maximum likelihood problems with a hnite 
number of parameters. If a suitable hnite parametrization has been chosen, it remains 
to solve the maximum likelihood estimate at a given level of parametrization. One also 
has to determine this level based on the quality and also stability of the hts obtained. 
The resulting densities can no longer be calculated analytically. Also, solution of the 
maximum likelihood problem gives rise to high dimensional optimization problems. 

The maximum likelihood method requires a parametric representation of the proba¬ 
bility density functions (PDF). The PDF can however be obtained as a solution to the 
Kolmogorov forward equation (Fokker Planck equation). The parameters a then enter in 
this equation via coefficients in the generator of the semigroup [2]. If the Levy measure 
u is not zero, the generator of both these equations does not only contain a 2nd order 
partial differential operator, but also an integral operator of convolution type. This places 
the model calibration problem in the framework of optimal control problems with partial 
integral differential equations (PIDE) constraints. 

Indeed, we know that the Kolmogorov forward equation is representative of a stochas¬ 
tic process described in terms of SDEs such as that one of Eq. (Q, where the set of 
parametrization for the approximation of the PDF, would correspond to a set of controls 
of the stochastic dynamic equation, so that, jointly to the maximum likelihood problem, 
it corresponds to a stochastic optimal control problem. The classical way to deal with 
the optimal control of stochastic process is by the Dynamic Programming principle and 
the related Hamilton-Jacobi-Bellman equation for stochastic processes [9]. However, this 
problem has been recently framed as a constrained PDE optimization problem, where the 
PDE is the Fokker-Planck, i.e. Kolmogorov forward, equation Ellis]. Following this 
framework, the solution of the maximum likelihood problem, i.e. the stochastic optimiza- 
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tion, is found by solving the first order optimality conditions in a functional space, that is 
the optimality system consisting of two PIDEs, named forward and backward (or adjoint) 
equations, plus an optimality condition. 

This optimality system can be numerically solved by a gradient-based iterative al¬ 
gorithm as follows. The Kolmogorov forward equation has a set of control parameters 
in order to maximize the log-Likelihood functional for its terminal value. These controls 
involve the Kolmogorov backward equation (adjoint equation) with suitable terminal con¬ 
dition, corresponding to the log-likelihood functional. Hence, given an initial approxima¬ 
tion of the unknown parametrization, hrst solve the forward equation, then set up the 
terminal condition and solve the adjoint one. With both the forward and adjoint solu¬ 
tions, by using the optimality condition equation the gradient is computed. Then with a 
descending gradient technique, such as a non linear conjugate gradient method, found a 
better approximation of the control parameters and repeat until the satisfying accuracy 
for the parametrization is found. 

Since this maximum likelihood problem could have an high dimensional space ad 
a huge number L of observations, a fast, stable and enough accurate numerical solver 
for the PIDE is required. In our case the Kolmogorov forward equation is a PIDE of 
parabolic differential operator type. Such kind of PIDEs are, e.g., investigated in the 
option pricing models as a generalization of the Black-Scholes equation. The hrst difficulty 
to numerically solve this equation is the integral. In fact, in the case of using a fully 
implicit method, it would lead to solve a dense system of equation, for this reason implicit- 
explicit (IMEX) or operator splitting methods can be applied to bypass this problem (see 
[m [ini IS]). The solution of the Kolmogorov forward equation is a probability density 
function that is non negative with constant integral over the domain. Such properties 
must be owned from the discrete solution too. The Chang-Cooper (CC) is a non-negative 
and conservative numerical method that has been used to solve the classical Fokker- 
Planck equation [H la E]- Here, we use a numerical method that can be classihed 
as IMEX, since we use the CC method with an implicit time difference scheme for the 
differential operators of our PIDE, and evaluate the integral operator at the previous 
time step solution, i.e. in an explicit way. We prove for the resulting numerical solver: 
conservativeness, non-negativity and stability in the L^-norm. The numerical solver for 
the adjoint equation is obtained directly from the solver for the forward equation by using 
the “discretize then optimize” approach to the optimization problem. 

Finally, we quote, that for related work with vanishing Levy measure see e.g. 
and for estimation procedures based on non parametric approximations of the empirical 
characteristic function see e.g. [7|. An approach based on the method of moments and 
asymptotic expansions of Levy densities can be found in ra. 

The article is organised as follows: In the following Section]^ we describe the hierarchy, 
of estimation problems. We shall show that the estimation problems that can actually 
be solved numerically can come arbitrarily close, at increasing computational cost, to the 
fully general Levy estimation problem. We also show that the use of periodic boundary 
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conditions in the Kolmogorov equations can be understood in terms of mapping the 
original Levy process on the real line to a derived Levy process on the torus. 

In Section we set up the maximum likelihood estimation problem for a given 
parametrization and derive Kolmogorov’s forward (Fokker-Planck) equation and its ad¬ 
joint (Kolmogorov backward) equation with terminal conditions set by the log-Likelihood 
objective functional. This maximum likelihood estimation problem is solved in the frame¬ 
work of the Fokker-Planck optimal control of stochastic processes, as a constrained PDF 
optimal control problem. 

In Section the discretization for Kolmogorov’s equations and the optimal control 
scheme is derived following a Chang-Cooper and IMEX approach. In particular we prove 
the structural properties of the numerical solution, i.e. conservativeness, non-negativity 
and stability. 

Section 1^ gives numerical tests for the consistency of the proposed procedure based on 
simulated data. We propose to use Akaike’s information criterion (AIC) [13] to choose an 
adequate parametrization from the hierarchy of spline parametrizations for density of the 
Levy measure. Three different tests are performed: We first fit data that are simulated 
from a given distribution within our hierarchy of Levy distributions. The fits obtained 
are shown to be of very good quality and AlC-selection criterion reproduces almost the 
original parametrization. As a second test we ht simulated data from a bi-directional 
Gamma process, i.e. the difference of two independent Gamma processes |2|, which is not 
inside one of the parametrizations of degree Nq, but can be approximated by those. The 
bi-directional Gamma process is augmented by a small diffusive component and projected 
to the torus. The AIG selection criterion and the htting results again reproduce the hnal 
distribution of this process rather adequately. As a final test, we select financial data 
from the German stock exchange DAX in a period between April 1998 and March 2002 
and consider daily log-returns over 1000 trading days. This period represents a rather 
stable period for the DAX with an almost constant level of the volatility. After projection, 
the AIG based method selects a six-parameter spline approximation of the Levy measure 
density. The resulting hts again give a decent reproduction of the empirical distribution. 

Our conclusions and an outlook are given in the hnal Section 

2 A Hierarchy of Parametrizations for Maximum Like¬ 
lihood Estimation of Levy Data 

The estimation problem for the Levy measure u is plagued by several issues. Here we 
take a step by step approach towards the derivation of a hierarchy of estimation problems 
that approximate the original one. 

Let f{x\a) = f{x\a,NQ), a G be a family of probability density functions with 
variable dimension Nq of the parameter set a, and let f{x) be the (unknown) probability 
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density of Yt. 

For Nq fixed, we apply the Maximum Likelihood method to select an estimated value 
a. based on increasing sample size. It is known from the general theory of Maximum 
Likelihood [I9] that the estimated a converges almost surely to the true value d, provided 
f{x) = f{x\a,NQ) holds. 

This leaves the question open, which parametrization - or which value for Nq- one 
should choose. We solve this problem by maximizing the Akaike Information Criterion 
(AIC). Maximizing the AIC corresponds to minimizing (asymptotically) the expected 
Kulback-Leibler distance, or relative entropy, between the true distribution f{x) and its 
parametric estimate f{x\a^NQ). See [13] [Chapter 6] for a detailed derivation. 

Let us now dehne the parametrizations with Nq parameters. We intend to show that, 
for sufficiently large Nq, we can approximate the original Levy distribution to an arbitrary 
precision. This requires several steps of approximation: 


Truncating small and large jumps. The total mass of the Levy measure, A = z/(M \ 
{0}), can be inhnite. This quantity dehnes the average number of jumps per unit time 
of the associated Levy process [2]. The easiest way to deal with this is to truncate small 
jumps by setting = l{|s|>£}Z/ which is a hnite measure by equation ([^. Using ^ and 
dominated convergence, one can moreover prove that ipeik) —t for G M as e \ 0. 

Here is given by ([^ with u replaced by By the continuity theorem of Paul Levy, 

see e.g. [HI Theorem 23.8], this then implies (weak) convergence in law of the respective 
probability distributions. 

At the same time, a hnite Levy measure u permits one to re-parametrize 'ip{k) of Eq. 

0 


via 


a 


■^(fc) = ibk — '^k‘^ + 


'R\{0} 


- 1) diy{s) 


(5) 


with b ^ b — sl{|s|<i}(s)(iz/(s). In the following we assume ly to be hnite and use 

parametrization ([^. The last term in (|^ now has the structure of a compound Poisson 
distribution, i.e. V(t) can be represented as Z(t) + where Z ~ M{bt,a‘^t) is 

normally distributed, N{t) ~ Po(Af) is Poisson distributed with intensity At and Uj 
are i.i.d. random variables with distribution given by the normalized Levy measure, 
Uj ~ A“V. Also Z{t),N{t) and Uj are all stochastically independent. 

With a similar argument, we can cut oh large jumps by replacing u with l{|s|<£-i}^- 
Also in this case, in the limit e \ 0, the truncated Levy distributions converge in law to 
the non truncated one. In the following we thus assume that the support of u is contained 
in some hnite interval The appropriate size of this region can be estimated e.g. 

by the Chebyshev’s inequality using empirical mean and variance from the data. 


Regularizing the Levy measure. Given a non negative, continuously diherentiable 
function y with compact support such that = 1, and setting Xe{x) = fy(f), 
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we define ge{s) = Xe* ^(s) = Xe{s — We consider the regnlarised measnres 

di'^{s) = gs{s)ds. Inserting this measure in (|^, using Fubini’s theorem and dominated 
convergence, one easily shows that 'ipe{k) —)■ 'ip{k). This again implies convergence of the 
related probability distributions in law. 

Spline approximation of the densities. Let thus di>{s) = g{s)ds with p{s) non 
negative, continuously differentiable and with compact support. Let 6*^, j = 1,..., n + 1, 
be a collection of uniform grid points in M such that the support of g{x) is covered 
and 9j — 9j-i = A. Dehne pa(s) by the linear interpolation between points {9j,g{9j)). 
Again, one easily sees that pa(s) converges to g{s) as A \ 0. Also, for small A, the 
functions pa(s) have support in a hxed compact interval and are uniformly bounded by 
the maximum value of g{s). If we insert measures du^{s) = gA{s)du{s) into (|^, this 
expression converges to the related one with di>{s) = g{s)ds. This suffices to prove that 
an approximation of the probability distributions (in law) is feasible with 1st order spline 
densities. 

Fixing drift and diffusion terms. One might or might not like to include the drift 
and diffusion term determined by b and into the estimation problem. Although, in 
general, these quantities have to be estimated, here we keep a small hxed value for cr^ 
for reasons of numerical stability of the Kolmogorov equations. As long as this value 
underestimates the true diffusion, this corresponds to a splitting of the Levy process 
Y{t) = Z{t) + L{t) into stochastically independent components where Z{t) is determined 
by the purely Gaussian Levy triplet (6,cr^,0). L(t) will be a Levy process that contains 
drift, the excess diffusion and jumps. However, the distribution of L{t) at time T can 
be approximated in the sense of convergence in law by compound Poisson distributions 
without drift and Levy terms. The explicit construction can e.g. be found in |8]. We can 
thus approximate our estimation problem to arbitrary accuracy with a problem where 
drift and diffusion take hxed values. We also note that a non vanishing dihusion implies 
the existence of a probability density function f{x, t) for the distribution of Y (f). 

Periodic boundary conditions from the projection to a torus. Another problem 
with the Kolmogorov forward equation is the issue of boundary conditions. We have 
already shown that we can approximate the estimation problem by one where the Levy 
measure u has support inside a large interval [f2a,f2b]. Let G be the torus with 

the end points of the interval identihed. Let K = Qi, — Qa, then dehne the operator +k 
as X +K y ■= {x + y) mod (iP) a group operation on G, with (■) mod (■) the modulus 
operation. Let furthermore 

0 • +) (G, + k ) (6) 

be the group homomorphism dehned by 0(a;) = {x) mod (iP). Let X{t) = (j)(Y(t)) be a 
stochastic process on G. If Y{t) is a Levy process on the group (M, +), the same applies 
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to X{t) with respect to the group (fi, +k)- Note that in the dehnition of Levy processes 
only the group structure of (M, +) enters. Levy processes are naturally dehned on locally 
compact Abelian groups like (M,+) or also (hi,+/f), see [H]. Let us now consider the 
characteristic function of the hi-valued process X{t) at time t. By the periodicity of hi, 
only k values from are needed. To derive a Levy - Kinchine formula ([^ for X (t) on 
n from that of Y (t) on M, we consider for such values of k 






Inserting (|^, we obtain 

il){k) 


ibk - 


+ 

/ 

^^isk _ 

1 ) 

dp{s) 




/R\{0} 


ibk - 


+ 

/ 

^^i4>is)k 

— 

l) dv{s) 




/R\{0} 


ibk - 


+ 

/ 

(gisfc _ 

1 ) 

d(p^iy{s), 




Jn\{o} 



(7) 


( 8 ) 


with 0*1/ the image measure on u under 0. Note that under the hypothesis that v has 
support in [hla, can be reconstructed from 0*i/ as u = 00^0*i/, where 0“^ : hi —)• M 

is the natural embedding. Using this, we identify u and 0*z/ in the following. 

Summing up, we consider the maximum likelihood parameter estimation problem 
with fixed a piecewise linear density function with Nq grid points for the finite Levy 
measure and periodic boundary conditions on hi. By refining the grid for the linear inter¬ 
polation, enlarging the size of the torus and letting the fixed diffusion go to zero, we can 
approximate the distribution of any Levy process with one of our candidate processes in 
the topology set by weak convergence in law. This constitutes the hierarchy of maximum 
likelihood estimation procedures. 


3 Kolmogorov Equations and Optimality for the Log- 
Likelihood 

In this section we formulate the optimization problem for the maximum likelihood param¬ 
eters estimation. The maximum likelihood estimator as an optimizing objective functional 
is given together to the Kolmogorov forwards BIDE as constraints. The optimality sys¬ 
tem is written by using the Lagrange multipliers method in a functional space, by also 
including the Karush-Kuhn-Tucker conditions for the non negativity of the optimizing 
parameters. 


Objective functional and forward equation. Let the L independent sample values 
Xi,..., Xl be given, and X; G / = 1,..., L, where fib]. These values can 

e.g. be obtained as X; = 0(T]), where 0(-) is the group homomorphism dehned in Eq. 
(|^ and Yi is the Levy process on M. We deal with the problem to hnd the PDF of X(T) 
such that it best hts with the sample values. For this purpose we consider the maximum 
likelihood problem in the framework of PIDE-constrained optimization: We have to hnd 
the maximum likelihood estimator 


maxJ(/, a), (9) 

Q; 

with respect to the parametrization of the measure given by d, where 

1 ^ 

J{f,a) = -J2^og{f{Xi,T,a)), (10) 

^ i=i 


is the (normalized) log-Likelihood with the constraint given by the following Kolmogorov 
forward (Fokker-Planck) equation for the Levy process X{t) on the torus fl with Levy 
data {b,a‘^,Ua) using the parametrization (5) and duais) = Q:j0j(s)ds: 

dtf{x, t) + bd^f{x, t) - t) - + s,t) - f{x, t))Qj{s)ds = 0 

f{x,0) = fo{x) 


^ /(D„,t) =/(Db,t), d^f{na,t) = d^f{nb,t), 

( 11 ) 

where f{x,t) represent the PDF of the process at time t. This PIDE is dehned in the 
interval of time t G [0, T], and with periodic boundary conditions on = 0,^. Here Qj{s) 
is a set of triangular shaped basis for the set of continuous functions that are linear on 
see the preceding section, 


0i('5) — 1 + (s - 0j)/X s G [9j - A, 9j] 
= 1 - (s - 9j)/A s G [9j, 9j + A], 


where 9j = + A(j — 1) for j = 1,... Nq + 1, are the points of a discrete uniform mesh 

of step size A = {fib — fia)/Ne dehned on the domain. The periodicity 0i(s) = QNe+i{s) 
is assumed. 

The existence and uniqueness of the solution of the Fokker-Planck equation (0 is 
well established, also for initial conditions belonging to the class of measures [8]. 


The optimality system. If we write the mapping a —)■ /(d) between the maximization 
parameters and the PDF, then we introduce the so-called reduced cost functional j(d) = 
J(/(d),d), so that the maximization problem becomes 

max j(d) = max J(/(d), d) (12) 
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A local maxima a* for J can be found by solving the optimality system obtained by 
vanishing the variations of the following Lagrangian functional 


>C(/,p,tt,7r) =y ^log(/(Xi,T,a)) + / [ 
^ Jo Jn 


t) + bdj{x, t) - t) 

/=! v' u v' L ^ 

Ne -.AT© 

— / {f{x + s,t) — f{x,t))Qj{s)ds p{x,t)dx dt — (13) 

i=i -I i=i 


where n = (tti, ..., TfArg,) and a fulhl the usual Karush-Kuhn-Tucker (KKT) conditions 
TTjttj = 0 and 71 j > 0. These are important to include the non-negativity constraints for the 
control variables. Note that if the condition aj > 0 is violated for some j G {1,..., Nq}, 
the density of the measure dua{s) is negative in a neighbourhood of 6j and thus is not 
a Levy measure any more. The sum should be extended only on the active 

constraints, i.e. when a*, = 0. For those values of j on the maximum where a* > 0 we 
have TT* = 0. 

First we calculate the variation £(/ -|- 5f) — C{f) for the adjoint equation. In the 
following the variations are calculated separately for each addend of the r.h.s. We get 


Nq 

^3 / 




Nq pT 

=5:“ ^ 



0 JQ Jn 


^3 


J=1 


0 Jn ^ Jn 


—Sf{x, t))Qj{s)p{x, t)ds dx dt 
Qj{s)ds\5f{x, t)p{x, t) dx dt. 


(14) 


For the term — aj Sf(x + s, t)Oj(s)ds)p(x, t) dx dt we apply the substitution 

y = X + s., then exchange x y, so that it recasts to — aj ( J^piy, t)&j{x — 

y)dy)Sf{x,t)dxdt. Then, again, we substitute s = x — y and, by inserting also the former 
term, we get 



0 Jn 


Nq 

E 

'1=1 


aj / {p{x — s,t) — p{x))Qj{s)ds 6f{x,t)dxdt. 


in 


(15) 


For the variation of the time derivative, integrating by parts, one obtains 


6f{x,t)p{x,t)\Qdx— / dtp{x,t) 6f{x,t)dxdt. 


(16) 


The variation Sf{x, 0) = 0 holds because of the Cauchy initial condition, while the vari¬ 
ation in T can be dehned in some points Xi. Next, we integrate by parts the term 
with the hrst order derivative in x and obtain b (df(x, t)p(x, t)) In dt. 
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Due to periodicity in the first term, — p{Qa,t)) has to be zero, hence 

p{^b,t) =p{na,t). 

From the diffusive term we get 


-y ^ / dl6fp{x,t)dxdt =-^ j [dxdfp-5fdxp\^^'’^dt+ / dlp{x,t)dxdt. (17) 



a 


Ub , 


0 Jn 


The first boundary term is zero because of the periodic condition of the variation of the 
derivative of / at the boundaries, and because of the previous periodic condition on p. 
The second is analogous and has to vanish, we therefore get the continuity condition 
OxPi^ajt^ dxPi^bjt^- 

By collecting all the terms under double integral, we get the adjoint equation. The 
remaining boundary term df(x, T)p(x, T)dx will be considered below. 

To calculate the variation on / in the functional J we perform an additional integration 
in space, so that 

[ ^^s{f{x,T))5{x - Xi)dx, (18) 

;=1 dn 

where S(.) is the <5—Dirac measure, then variate f{x,T) + Sf{x,T), hence 


/ ^og{f {x,T) + 6f{x,T))S{x - Xi)dx 

Jn 

= [ i^og{f{x,T)) + 5f{x,T)/f{x,T))5{x-Xi)dx, (19) 

Jn 

so that the first order terms plus the remaining boundary, give 

+ [ p{x,T)Sf{x,T)dx. (20) 

Jn J\X^ x ) 

This expression have to be zero for each 6f{x,T). It represents the terminal condi¬ 
tion for the adjoint equation; that is p{Xi,T) = —l/{Lf{xi,T)), and p{x,T) = 0, 
if a; 7^ {Xi,...,Xi}. In case of multiplicity of Xi the condition becomes p{Xi,T) = 

with /' running on the multiplicity value. 

Summarizing, the adjoint equation (Kolmogorov’s backward equation) is as follows: 
-dtp{x, t)-h dxp{x, t) - —dlp{x, t) - aj(p(x p(x, t))0j(s)ds = 0 

p(Qa, t) = p(Db, t), dxPi^a, t) = dxPi^b, t). 

( 21 ) 
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We note that by reverting the sign of the time we get the same PIDE as the forward 
equation (up to a reflection of the drift and jump direction), hence this equation has a 
unique solution, also for the non regular hnal value problem |H]. 


Second, we variate in Eq.(13) the fitting parameters C-{ptj + Saj) — C{aj), from which 
we found the optimality equations: 


7Tji 



0 Jn Jn 


{f{x + s,t)—f{x,t))p{x,t)Qj{s)dsdxdt = 0, j = l,...,NQ, (22) 


where j' runs on the set of values where a*, = 0. Note that the active vr,/ do not change the 
gradient, but simply balance non-zero gradient components that point to the directions 
where the inequality constraint ay > 0 is violated. As in our case we deal with simple box- 
constraints on the aj themselves, we can set those components of the negative gradient 
equal to zero that correspond to an active index j' and are negative, when determining 
the update. This then accounts for the effect of the vry, see e.g. [201 [2S]. 


The order necessary optimality system consists of the Eqs. (11), (21) and (22). Its 


solution gives values aj,..., that are candidates for maximizing the functional (10). 
Note that maximum likelihood fits in most cases do not correspond to convex optimization 
problems and one always has to account for the perils of local minima that are sub-optimal 
globally. 


Forward equation in flux form. The forward equation 0 , can be written in flux 
form: dtf{x,t) = dxJ^{x,t), where J^{x,t) is the flux defined as 


( 7 ^ \ 

J'{x,t) =-bf{x,t) + Ydxf{x,t)+ Y^aj J {^J f{x - y,t)dyjQj{s)ds. 


(23) 


By using f{x - y)dy = f(x - y)dy = f{z)dz = f{x + s) - f{x), it i 


easy to verify that Eq. (23) is equivalent to Eq. rtll|. Further, from the conservation 


IS 


of the total probability, it follows that the flux has the periodic boundary condition 
^'{Qayt) = From this we immediately get the periodic condition on the first 

derivative = d^f{Vtb,t). 


4 Numerical Scheme 

The numerical solution of the optimality system is found by a non linear gradient conjugate 
iterative procedure [23l |29l 0]. At each iteration the solution of two PIDEs, the forward 
and the adjoint one, must be found. In particular the structural properties of the PDF 
solution must be satisfied, as well as a stability condition of the PIDEs numerical scheme 
solver. 
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For the numerical discretization of the Kolmogorov forward equation we use the 
Chang-Cooper scheme (CC) [H], joint to a 2nd order backward differentiation formula 
(BDF2) for the discrete time operator. The CC method was proposed for a Fokker-Planck 
resp. Kolmogorov equation [1] without the integral term. It is stable, second-order accu¬ 
rate, non-negative, and conservative numerical scheme HE]- 

The CC method is used for the differential operators, the integral term is treated 
separately according to an IMEX methodology. We denote the following B = —b and 
C = ct^/ 2, then the Kolmogorov forward equation reads as follows 


Nb 


= d^F{x,t) + / + s,t) - f(x,t))&j(s)ds, 


(24) 




where 


F{x,t) = B f{x,t)+ Cd:cf{x,t). (25) 

Consider a uniform grid of size h on the space domain {r2h}h>o given by hl/j = {x G M : 
Xi = ih + ria, i = 0,N, h = (12^ — fla)/N} and a uniform grid on the time domain 
ht = {t ^ [0,T] ■. tm = m6t,m = 0,...,NT,5t = T/Nt}. Let denote 

the approximated values of the continuous solution of the FPE. We employ the following 
discretization of (23) 


dsnfr^ = - ^-1/2) + Qifr, «), 


where 


O- fm _ 
^BDJi ~ 


3/r-4/r~^ + /, 

26t 


pm—2 


(26) 


(27) 


is the BDF2 operator. Q(//";a) is the sum of the integrals of Eq. (23) calculated with 
the mid-point scheme 


Ne 


N 


(?(/”;«) = ft 5^ aj 5^ 

j=l k=l 


(28) 


where a = (cki, ... ,aAre), A™ ~ {f{xi + Sk,tm))n^ represents the translated Z™ by the 
value Sk G flh and continued by periodicity, 6jk = Qj{sk), 


a = 


Ne 


N 




X". 

k=l 


jk 



(29) 


Note also that the summation starts from k = 1, because the point A; = 0 is the same 
of that k = N. Therefore, the solution at a new time step is calculated by solving the 
following equation for the unknown 

3fr*' - - fr' + « Q(/r; a). (ao) 
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with the initial condition 


/° = /o.. (31) 

This scheme is based on the fluxes at N cell boundaries. The partial flux at the 
position Xi+hi 2 is computed as follows 


pm+l 

^i+ll2 


{1-6)B + -C 




m+l 

i+1 




■SB /, 




(32) 


This formula results from the following linear convex combination of / at the points 
i and i + 1: 

/+/2 = (l-'5)/+‘+i/r‘. ielo.l], (33) 

The idea of implementing this combination was proposed by Chang and Cooper in [H] 
and it was motivated with the need to guarantee positive solutions that preserve the equi¬ 
librium configuration. Indeed, the CC method is related to exponential fitting methods, 
such as that one proposed by Allen and Southwell P, and by the Scharfetter-Gummel 
discretization scheme [28]. The value of the parameter his5 = l/ta — l/(exp(ta) — 1), 
where w = hB/C, which can be shown to be monotonically decreasing from 1 to 0 as w 
goes from —oo to oo. Notice that with the choice of 5 given above, the numerical scheme 
shares the same properties of the continuous FP equation that guarantee positiveness and 
conservativeness. This is a special case of the CC scheme because in the general one, the 
functions B and C may depends on (x, t), hence also 5 may depend on (x,t), too. Both 
the CC scheme m and the mid-point are second order accurate, then a second order 
numerical scheme results. 

Let Z™' = (/™,..., be the discrete solution at the time tm, with omitted due 
to periodicity, and /3 = C/h — SB. The action of the finite difference operator for F™ in 


Eq. (26) reads as matrix A whose elements are defined by 


Ai^i =-/3{1 + u)/h, Ai^i_i = (3/h, Ai^i+i = ujS/h, Ai,Ar = /3/h, Aat,! = 

(34) 

where (3 = B/{u — 1), u = exp(hF/C'). Hence, Af^ := (F™j^y 2 ~ -^- 1 / 2 )/^’ 

Eq. (30) can be written in matrix form, as follows 

Mfm+l ^ 35 ^ 

where M := 31 — 2 St A (36) 


is the matrix coefficients related to Eqs. (30) and (32). We note that this method needs 


of a second starting point, that can be calculated by using a first order Euler scheme with 
a smaller time step size than St. 

The implicit Euler scheme for the Eqs. (24) and ( |32| ) is 

(j - stA)r = r-' + St «). (37) 

These two numerical schemes own some properties that can be easily proved, but we 
list here as remarks. 
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(24) and (32), defined in the periodic 


Remark 1. The Euler-CC scheme (31) to Eqs. 
domain Vl^, is conservative. 

In fact, = O 5 «) = 0 because the set of values of ffi 

the same as f(fi, being the last only translated by k. Hence, X]i=i fr = 


are 


Remark 2. Provided that X]i=i/i"* = 'l2i=ifr > then the BDE2-CC scheme (35) to 


Eqs. ( 24 ) and (32), defined on the periodic domain Vt^, is conservative. In fact for the 
same constraints on A and Q as above, we get the identity 3 = 4 ~ 

E fm—l _ q rm 

i=l Ji ~ Z^i=l Ji ■ 


The positivity of the numerical scheme is proved by using the theorem for the class 
of M-matrix 123. Given a positive matrix E, Eij > 0, we say that M = si — E is a. non 
singular M-matrix if s > p{E), where p{E) is the spectral radius of E. A non singular 
M-matrix has the important property 


M is non singular M-matrix M ^ > 0. 


(38) 


Theorem 1. Let 6t < 1/a, with a defined in (29), then the Euler scheme (31) to Eg. 
( 24 ), defined in the periodic domain klh, is positive preserving. 


Proof. The argument is as follows: let R the matrix operator such that 

Rr = Ojkflk- Such a matrix is non negative because aj and 9jk are. 

The numerical scheme ([3^ can be recast as 


1 htfi 

1 + ^( 1 +.) 


j /’ 


= {1 - aSt)f"'-' + StRf 


cm—1 


(39) 


where A = A — diag(A) is a positive matrix. Provided that > 0 and St < 1/a the 
r.h.s. is a non negative vector. We observe that the matrix on the l.h.s is always diagonal 
dominant, hence it has a convergent regular splitting and consequently is an M-matrix 
[27] . Therefore, (/ — 6tA)~^ is non negative and /”* will be too. □ 


In order to prove the positivity of the BDF2 numerical scheme (35), we need of the 
following Lemma that gives a lower bound to the velocity of decreasing of the solution. 


Lemma 1. Let the vector G be given non negative. Take a number ^ > 1, then 
the solution f'^+i calculated with the Euler scheme of Eg. (31) after q time steps satisfies 
the following inequality 

e-i 


provided that St < 


af -|- / 3(1 -|- uS)/h 


with parameters defined in Eg. (34). 
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Proof. A proof is given for a particular case in m (see also Refs, therein). Here we prove 
it as follows. Given and calculated with (37), let dehne v = — Z™. By 

applying the operator / — 6tA, we get (/ — 6tA)v = — {I — 6tA))f^ + d), i.e. 

(J — 6tA)v = {{f — 1 — 6t{(3{l + oj)/h + a^))I + 5tA + 

where A = A — diag(A) is a positive matrix. Now provided the bound for 6t, then the 
r.h.s. is positive and from Th. [^we get that n > 0. By iterating that inequality q times, 
we get the thesis. □ 

Remark 3. The upper bound on 5t in Lemma^ results to be St < 1/a for ^ > 1, hence 
the condition on the Lemma is stricter than those on non negativity of Thm. 

Now we show a Lemma similar to Lemma [T] valid for the BDF2 scheme. 

Lemma 2. Let 1 < ^ < 3 and St < h{^ — 1)(3 — ^)/{a^h + 2/3(1 + a;)) be the time step 
size of the numerical scheme of Eg. (35) that generates the sequence of vectors f'^ for 
m = 2,3,... from the starting vectors /°, /^. If there exists m* such that — Z^ 

and Z”^* > 0, then — f"^ > 0 for all m > m*. 

Proof. We apply the operator (3/ — 2StA) to v = 

{31 - 2StA)v = ^(3/ - 2StA)f^+‘^ - {31 - 2StA)f^+^ 


> 0 


and use Eq. (35) to the hrst term on the r.h.s. to get 


cm+1 


{31 - 2StA)v = [4^ - 3 - St{a^ + 2/3(1 + u)/h)]f^+^ - ^f^ + St{2A + ^R)f 

where A = A — diag(A) is a positive matrix. We know that (3/ — 2StA) is an M-matrix 
and its inverse is always non-negative. Also 2A + ^R is non negative. Hence, we can prove 
non negativity of v, provided that 

4^ - 3 - St{a^ + 2/3(1 + uj)/h) > 

for a value m = m*, because of the hypothesis — J"** > Q, that also states that 

ym*+i > g_ j] 2 equality is just the bound on St in the assertion that gives a 

positive value for St only when 1 < ^ < 3. □ 


Indeed, this Lemma proves positivity of the numerical solution of Eq. (35), provided 
that Z° ^ 0, and f^f^ — f^ PH- f^ is the second starting value of the numerical scheme, 
that can be calculated with the Euler scheme (37). 


Theorem 2. Let f^ >t) the discrete initial condition (31), and let f^ the second starting 
value calculated with the Euler scheme (31) with an appropriate time step, such that 
^f^ ~ f^ P 0, for 1 < ^ < 3. Then, the BDE2 scheme (35) to Eq. (24), defined in the 
periodic domain PLh, is positive preserving for the solution Z™, with m > 1. 
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Proof. The proof is an application of the Lemmas and 


□ 


In order to establish the stability of the discrete nnmerical schemes of Eqs. (35) and 
(37), we need ineqnalities of the form ||/'"^^|| < iL||/™'|| evalnated in a snitable norm with 
K possibly less or eqnal than 1. We prove that it realizes for the 1-norm with K = 1. 

Theorem 3. Let the positivity condition of Theorem^ he fulfilled, i.e. 5t < 1/a. Then, 


the Euler scheme (31) is stable in the 1-norm, that is ||/”*||i < ||/ 


m—11 


for all m. 


Proof. Let r = 6t/3/h and invert the matrix operator at l.h.s., then Eq. (39) reads as 

-1 


I - 


rA 


r = 

Now we observe that 


1 r(l -h ui) 


1 -h r(l -h u) 


[{l-a6t)f^-^ + 6tRf^-^] 


I - 


rA 


1 r(H- u) 


< 1 - 


rA 


1 r(l -h ui) 


-1 


= l-\-r{l+ u). 


cm—1 1 


Hence, 

\\n\i<\\a-ast)r-^ + 5tRr m. 

Since St < 1/a, all the components of the vectors inside the norm at the r.h.s. are 
positive, so that the modulus for the evaluation of the 1-norm can be removed. Using 
Ya=i Qifr^ d) = 0 as in Rem. 0 we get the statement of the theorem. □ 

Now we can prove the stability of the numerical scheme with BDF2 integration of Eq. 


Theorem 4. Let the positivity condition of the Theorem be fulfilled, i.e. let St he the 
time step size of the numerical scheme of Eq. (35), /° > 0 the discrete initial condition 
(31) and f^ the second starting value evaluated at the time St. If there exists a real number 
f such that f^ > f^/f with 1 < .^ < 3 and St < h{^ — 1)(3 — + 2/^(1 + to)), then 


the BDE2 scheme (35) is stable in the 1-norm, that is ||/™'||i < ||/ 


m—11 


for all m. 


Proof. The numerical scheme (35) can be written as 

= (4 - C)/'" + (c - aSt)r - r~^ + 5tRf 


where R is defined as in Thm. We apply M ^ and evaluate the 1-norm to both sides. 
Following the same calculations as in Thm. i we get that \\M ^||i = 1/3. 

From the bound on St, we note that 


.St<{i- 1)(3 - O/'^ < 4 - 2^3 < 0.536. 


(40) 
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This means that for all ( in the interval 5 — 2 a/3 < ( < 3, it is ( — a St = ^ with ^ E (1,3). 
Now we have that Z’" > 0 by virtue of the positivity condition, — a5t)f'^ — > 0 

by our assumptions, and 4 — ^ > 0, hence is guaranteed that the sum in the r.h.s. is a 
non negative vector and the modulus in the calculation of the 1-norm can be removed. 
Using the property given in Rem. i we conclude that ||/”*'''^||i < ||/™'||i- □ 

Remark 4. Indeed, in the stability Theorem 0 the equality = ||/”*||i holds. 

In fact, because of the conservativeness from Rem. we have = X]i=i 

and under the non negativity condition of Theorem all the components of the vectors 
fr'^^ifr negative, so that the previous conservativeness identity corresponds 

to the 1-norm equivalence. Further, we can state that for these numerical schemes the 
conservativeness and the non negativity imply the stability of the discrete operator. 


Remark 5. We can finally conclude from the Lax equivalence theorem, that for regular 
solutions of the Kolmogorov forward equation f{x,t),{x,t) E [fl,T], provided that the 
hypothesis of Thm. then the numerical scheme of Eq. (35) yields discrete solutions 
that are second order convergent in time and space. 


Remark 6. The non negativity conditions of for the Euler scheme of Lemma\^ and BDF2 
of Lemma can be correspondingly written as 


6t < 


__ 

af + Bcoth{hB/{2C))/h 


and 


6t < 


K-i)(3-0 

2B coth.{h B/{2C))/h 


We note that for h —0 or R —)■ 0 the upper bound for 6t scales as /C. For C —)■ 0 R 
scales as h/B. 


Adjoint equation. The discrete adjoint equation can be found by discretizing the 
Lagrangian function of Eq. (13) and then performing the variations on the discrete 
variables. This is know as the discretize-then-optimize approach (see Ref. [1] for details). 
This technique yields the following discrete adjoint equation 


mV'" = ^ a), (41) 

where Ml is the transpose of M, and Q{p‘^~^^]a) = hJ2j=i(^jJ2k=iP'ik^jk ~ 

PTk ~ 

The numerical stability is given by the same condition for the forward equation, since 
the transpose of the operator M has the same eigenvalues, but in this case the non 
negativity and conservativeness property are not required. 
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Care has to be taken for the discrete terminal condition, since it can not be dehned 


throngh the Eq. (21) for the presence of the (5-Dirac measnre. For this pnrpose we 


discretize the term (18) as follows 

L N rx^+ll2 


1 

I 


L N i‘X^+112 ^ L N 

.. \og{f{x,T))6{x - Xi)dx = - EE log(/(£i,T))l{ 

1=1 i=l 


Xi(l[xi-l/2,x,+l/2)}i 


where Xi are the points of the integral average theorem. Then we use the approximation 
f{xi,T) ^ so that, by performing the variation 5//^^ on this discrete functional, we 
get the discrete terminal condition 


- PT,i - -J ^ l{x,eK-i/2,xi+i/2)}//r^, i - I,... ,N. 


(42) 


1=1 


According to Eq. (21), it completes the formulation of the discrete adjoint problem. 


Discrete gradient. The discrete of the reduced gradient related to the optimality con¬ 


dition of Eq. (22) is calculated with the mid-point quadrature formula. Each component 
j is given by 

Nt N N 

(Dj), := -H E E E( A - /r)pr»;t. (43) 

m=0 2=1 k=l 


where {DaJ)j ~ (V^J) 


r 


Non linear conjugate gradient method. The availability of the discrete gradient 
allows us to implement a non linear conjugate gradient scheme (NLCG) in order to solve 


the optimization problem (12). NLCG represents an extension of the linear conjugate 


gradient method to non-quadratic problems [23l [29l 0]. 

The optimality system is solved by implementing the gradient given by the following 
algorithm: 

Algorithm 1 (Evaluation of the Gradient at a). 


1. Solve the discrete FP equation (30) with given initial condition (31); 


2. Solve the discrete adjoint FP equation (jl) with terminal condition (42); 


3. Compute the approximated discrete gradient D^J by using (43); 
4- End. 
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in a NLCG scheme. The search directions are recursively as 


dk+i — —Qk+i + f^kdk, (44) 

where /c = 0,1,2 ,... in this paragraph stands for the iteration index, = DaJ{ak) is the 
numerical gradient, with do = “do- Let ak an estimation of the best rates at the iteration 
k, the next one for a minimum point are given by 


ttfc+i = ak + ^k dk, (45) 

where > 0 is a steplength obtained with a line-search that satisfies the Armijo condition 
of sufficient decrease of J’s value as follows 


Jiak T ^k dk^ ^ J(ak) T d ^k Jis^k)i dk^u ) 


(46) 


where 0 < d < 1/2; see [25]. Notice that we use the inner product of the U = 
For the formula of /3k we use the formulation due to Dai and Yuan im 


space. 


iDY {gk+i,gk+i)u 


fdr = 


(dfc, yk) 


u 


(47) 


where yk = gk+i - gk- 

Summarizing, the NLCG scheme is implemented as follows 
Algorithm 2 (NLCG Scheme). 

• Input: initial approx, a^, do = — VJ(do); index k = 0, maximum kn 
tol. 


tolerance 


1. While (k < kmax && lls-fcllR^ > tol ) do 


2. Search the steplength > 0, by sequentially shrinking, along dk satisfying (46); 


3. Set Ofc+i = ttfc + ikdk- i.e. Eg. (45), according to the KKT condition, the 


eventually negative components of ak+i are set to 0. 
4- Compute gk+i = VJ(dA:+i) using Algorithm^- 


5. Compute (3^^ given by (45); 


6. Let 4+1 = -gk+i + K dk, he. Eg l44\j 

7. Set k = k + 1; 

8. End while 
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Correction factor for the logarithm in the objective. The numerical evaluation 


of the functional of Eq. (10) has the problem of the logarithm in the points Xi where the 


PDF at the hnal time has vanishing values. Hence, the functional is replaced as follows 


= Y^log(max(e,f(X,,T,a))'), 


(48) 


1=1 


with e = 10-^2^ 

Nearest grid point for sample values The discrete PDF is defined on the mesh 
grid Qh, the sample values Xi used for the evaluation of the PDF are approximated to 
the nearest values of the space mesh grid This approximation affects both the value 
of the functional and the terminal condition for the adjoint equation. 

Von Mises distribution. The initial distribution /o(x) of Eq. ([II]) is set as the following 
von Mises distribution 


p(x;^, n) 


gKcos(27r(x-/4-Oa)/(Oi,-r2a)-7r) 

27iIo{k) 


(49) 


where /o(.) is the modified Bessel function of order 0, and k is the concentration parameter 
that should be taken large in order to approximate the Dirac delta function in zero as 
initial data for the forward PIDE. 


5 Numerical Tests 

In this section we perform the non parametric estimation of Levy density distribution 
function, that is to find the value a = (ai ,... ,aNQ) such that best fits with the given 
data. We present two validation test cases and one application case to finance. 

Testing for Consistency. We perform a numerical test on the consistency of our es¬ 
timation procedure. According to the maximum likelihood technique, consistency here 
means that, if we fix a parametrization Nq and the parameter values, we can (approxi¬ 
mately) reconstruct these values form our estimation procedure and maximization of the 
AIC, provides that a sufficiently large sample from the true distribution is given. We sim¬ 
ulate such a sample using pseudo random realizations for the Levy process X{t). Details 
on the simulation methods can be found e.g. in [24] . 

However note that in our case, cyclic boundary conditions have to be taken into 
account. The data setting for our test case is as follows: the space domain D = [—tt, tt), the 
final time T = 1, the initial von Mises distribution has center p = 0 and wideness k = 400, 
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the drift of the stochastic process is 6 = 0 and the Gaussian volatility is a = a/0.02. The 
setting for the numerical solution is: space grid size N = 420, time grid size = 250. 
The setting for the optimization is critical, we found the following parameters by the 
experience: initial approximation of the parameter rates do = (0.1,0.1,...), constant of 
the Armijo condition 6 = 0.1, initial step-length of point 2. of Algorithm is set to 
= 0.5 and shrink by a factor 0.3, = 0.3^*,. 

As a hrst test, we perform a £t for a set of L = 10® values generated by a Monte Carlo 
algorithm for a simulated Levy process on the circle, with the following hve values of 
the jump rates: a = {3,2,1,0.5,0.25}. We solve the htting problem, i.e. calculating the 
estimates to ai,..., for different numbers of interpolatory functions: Nq = 3,... ,7. 
The center 6*i,..., ^tvq of the basis functions Qj{x) are equally spaced in the domain 
(—1,1) at the places 9j = —1 -|-jA, j = 1,... ,Ne, A = 2 /{Nq + 1), this means the basis 
functions do not cover all the domain G. In the following table the calculated value of 
{aj} for each problem are reported versus Nq 



Nq = 3 

o 

Nq = 5 

Nq = 6 

Nq = 7 

Ol 

3.4502 

3.1771 

2.9746 

2.8580 

2.8452 

02 

1.1089 

1.5577 

1.8100 

1.9922 

2.1003 

O 3 

0.4505 

0.6576 

1.0198 

1.3025 

1.5083 

04 


0.3362 

0.4951 

0.7607 

1.0077 

O 5 



0.2490 

0.3946 

0.6137 

06 




0.2042 

0.3428 

CXj 





0.1847 


We see the good match for Nq = 5 with the original rates d. In Figs, 
also appreciate the good data htting between the calculated PDF and the histograms of 
the simulated Monte Carlo data, for the proposed optimization problem with Nq = 3, 5, 6. 

Another interesting problem is the selection of the number of parameters Nq and the 
corresponding basis functions Qj for the best data £t. In Fig. |^we depict the result of 
the Akaike’s Information Criterion (AIC) [I3], given by 

AIC{Nq) = LJ{f, a*) - log(iVe). (50) 

A common choice in statistics is to pick that parametrization that maximises the AIC. 
We can see that criterion gives the value NQ^opt = 6, while the correct value is 5. The 
difference in the AIC is however rather small for Nq between 5 and 7. 


Tp and^ 


we can 


Fitting Data from a bi-directional gamma process. In the second test we £t the 
hnal position at T = 1 of 10® samples of a stochastic process with the jumps distributed 
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Figure 1: Left. Result of the data fitting with iV© = 3 rates. Histograms: experimental 
Levy data collected in 40 bins. Solid line calculated PDF. Right. Dashed line calculated 
PDF with the original 5 rates. 




Figure 2: Result of the data fitting with iV© = 5 rates. Histograms: experimental Levy 
data collected in 40 bins. Solid line calculated PDF. Right. Dashed line calculated PDF 
with the original 5 rates. 




Figure 3: Result of the data htting with iV© = 6 rates. Histograms: experimental Levy 
data collected in 40 bins. Solid line calculated PDF. Right. Dashed line calculated PDF 
with the original 5 rates. 
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Figure 4: Test for the appropriate regularisation with Akaike’s Information Criterion 


according a bi-directional gamma process with Levy measure z/ on M given by the density 

[ 2 ] 

g-/3bl 

du(s) = A ds. (51) 

|s| 

Here A > 0 is the so-called shape parameter and jd is the rate parameter. Note that this 
is not a hnite measure, so we are out of the compound Poisson class, and the trajectory 
of the bidirectional gamma process as inhnitely many (small) jumps. In [2] only the 
unidirectional Gamma process is described. Let Y^[t) be such a unidirectional gamma 
process, then the Levy measure is 

p-hs oAt 

dz/+(s) = A^—l{s> 0 }is)ds and /y+(p(i/) = ^;^^ 2 /^*“^e"^^l{y>o}(|/). (52) 

Let thus Y^{t) and Y~(t) be two independent copies of the Gamma process, then 

r(«) = K+(«)-y-(() (53) 


is our bi-directional gamma process, which is the jump part of our Levy process that 
also includes diffusion as in the hrst experiment. If we project Y{t) to the torus [—7r,7r], 
the effect on the projected Levy measure see (|^, of the projected Levy process 
X{t) = cl>{Y{t)) is 


d(j)^u{s) = I A 


vn=0 


Q-h{\s\+nTT) 

|s| -|- UTT 
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Figure 5: Left, AIC test for the Gamma test. Right, ht with iV© = 9 basis functions, 
with s, a) the Lerch transcendent, for p = |s|/7r and q = /dvr, we get 

d(j),u{s) = ( e-^", 1, ds (56) 

TT \ TT / 

as the Levy measure on the torus for the projected bi-directional Gamma process. 

In the simulations data are generated by scaling the rate parameter such that 1//3 = 1, 
and setting the shape to A = 0.5. Finally, the data V(t) are projected to the torus 
[—7r,7r). In Fig. we report the result of the AIG test the £t with JVq = 9 basis 
functions centered to dj = (—1 -|- 2(j — l)/9)7r, j = 1,... 9, whose the calculated rates are 
d = (0,0.0417,0.0235,0.1529,0.8827,0.8616,0.1517,0.0316,0.0396). We conclude that 
our procedure results in high quality hts, even for Levy distributions that are not part of 
our hierarchy of parametrizations, but can only be approximated by these. 

Financial Data. As an example from a real world problem we report the result of 
htting of the German Stock Exchange (DAX) index, which is publicly available, e.g. 
from the Yahoo Finance, see Figure left panel. Within the data of all closing quotes 
between April 1998 and March 2015, there are several periods of elevated volatility, so 
called volatility bursts. Obviously, this contradicts the description of the market with 
an exponential levy model S(t) = [21 ESI El], as the statistical law of Y(t) — Y(s) 

does not only depend on f — s. In order to avoid the pitfalls of time dependent (or 
stochastic [I5l [22|) volatility, we identify a period of comparatively stable volatility of 
1000 trading days between April 1998 and February 2002, see the right panel of Figure 
This data set has a small drift value b = 6.787 x lO”'^ which corresponds to the 
increased value of the stocks of 67% nominal interest rate in 1000 trading days (followed 
by severe losses in the subsequent period). The empirical daily volatility (i.e. standard 


25 






















3000 4000 

5000 

6000 

0 

200 

400 600 

800 

1000 

Trading Days 





Trading Days 




Figure 6: Cumulative and daily log-returns of the DAX index between 26/11/1990 and 
05/03/2015 (left panel) 16/4/1998 and 4/02/2002 (right panel). The vertical lines on 
the left panel correspond to the time period on the right panel, downloaded from £- 
nance.yahoo.com 


deviation of daily log-returns) in this period of time is cr = 9.304 x 10“^. The obvious 
absence of axial symmetry prohibits a Gaussian (Black-Scholes) market model from the 
outset. Our goal is to find a suitable description of this sample with an exponential Levy 
market model from our hierarchy of parametrizations. The data has been mapped to 
the [—vr, tt] torus, by rescaling and ’wrapping’ the daily log-Returns below/above 3%, i.e. 

= [—0.03,0.03]. Three data sets, all of them negative, were situated outside this 
band The drift value is adapted to a re-scaled torus [—7r,7r). Thus the drift on the 
torus of length 27r is 5 = 6.787 x lO'^^Tr/O.Od = 0.07017. 

One fourth of the total empirical variance 8.655 ■ 10“® is attributed for the ’fixed’ 
diffusion, compare Section which yields a coefficient for the Laplace operator equals to 
TT^(empirical Variance)/(0.03)^/4/2 = 0.2372 on the torus rescaled to [— vr, tt). 

With this setting we calculated the fitting of the distribution, with equally spaced 
basis functions in the interval [— tt, vr). In Fig. (left panel) we report the result of 
the AIC test and the fit with Nq = 6 (right panel). The selected basis functions are 

^ Note that this corresponds to a loss of 10% over four years being wrapped to the positive side. If 
the data is left skewed, as in the present sample, this might well introduce a bias in risk estimation to 
the optimistic side, if the procedure is used ’as is’. This can be mitigated with a larger torus such that 
no wrapping occurs, e.g. [riaiB;,] = [—0.05,0.05], see Section]^ At the present stage, it is however not 
the intention of this work to provide a ready to use basis for risk estimates for financial applications. 
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Figure 7: Left, AIC for the maximum likelihood estimate as a function of parameters for 
the DAX data (left). Right density for the Maximum-Likelihood £t with Nq = 6 basis 
functions, corresponding to the maximal AIC. 

centred at 9j = (—1 (j — l)/3)7r,j = 1,... 6, whose the calculated rates are a = 
(0,0,0.484,0.223,0.304,0). Although only three parameters are different from zero, the 
AIC is maximized at Nq = 6. That the AIC at Nq = 3 is lower is explained by the fact, 
that the more localized basis functions in the Nq = 6-basis are more adequate to fit the 
data. It is also a misinterpretation that the chosen parametrization misses an effective 
description with three parameters, since the position of the grid points are additional 
parameters. Note that the zero entries of the 1st, 2nd and 6th slot a actually correspond 
to small positive values and only represented as zero when rounded to the 3rd digit. 


6 Conclusion and Outlook 

In the present article, we demonstrate the use of optimal control for PIDE for the non 
parametric estimation of Levy processes. Here the PIDE is given by Kolmogorov’s forward 
equation which allows one to calculate the terminal distribution of the Levy process at 
time T. The objective functional is the log-likelihood evaluated on a sample of terminal 
values of the Levy process. 

Based on the study of Levy distributions, we set up approximate estimation problems 
that can be tackled by maximum likelihood estimation along with model selection based 
on Akaike’s information criterion (AIC). As the density of Levy probability distributions in 
most cases can not be determined analytically, numerical solutions of the Kolmogorov for¬ 
ward equation (Fokker-Planck equation) and its backward (adjoint) analogue are needed 
for the efhcient maximization of the log-likelihood functional. 
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For the numerical solution of the optimality system we used the Chang-Cooper method 
with a mid-point quadrature rule and second order backward time differentiation formula. 
This numerical scheme is second order accurate and conservative, and we found conditions 
for stability and positivity of the numerical solution. We use a non linear conjugate 
gradient method to find the optimality condition. 

We have shown that this method works for spline discretizations of the density of the 
Levy measure with symmetric boundary conditions for up to 11 parameters. The results 
consistently fit simulated data from the family of discretizations itself. The same turns out 
to be true from Levy processes that only can be approximated by such discretizations, 
if the number of parameters goes to inhnity, like the gamma process. Here the AIC 
provides an effective mechanism to choose an adequate discretization at a given sample 
size. Finally, we have demonstrated that also real-world, financial data can be effectively 
fitted using our strategy. 

The future potential of this solution lies in the fact that, unlike FFT / spectral based 
calibration procedures that are widely used in hnancial engineering (ansi ED, the present 
approach naturally generalizes to processes that originate as the solution of Stochastic Dif¬ 
ferential Equations (SDE) with state dependent coefficients. Such local volatility models 
are frequently used in contemporary hnancial engineering. 

In this work, we used historic and low frequency data for non parametric model cal¬ 
ibration. High frequency historical data and implicit volatility data US [Ml are natural 
candidates to set up new objective functionals for related control problems that go beyond 
the control of the terminal distribution. 

Another relevant problem is the notorious occurrence of local minima in the maximum 
likelihood estimation. We expect this to be more severe, when the number of parameters 
signihcantly increases. An interesting hybrid approach would combine the robustness of 
non-parametric spectral calibration methods as a sort of pre-conditioner with the highly 
efficient maximum likelihood estimation. 
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